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Abstract. Additive tree functionals represent the cost of many divide-and- 
conquer algorithms. We derive the limiting distribution of the additive func- 
tionals induced by toll functions of the form (a) n a when a > and (b) log n 
(the so-called shape functional) on uniformly distributed binary trees, some- 
times called Catalan trees. The Gaussian law obtained in the latter case 
complements the central limit theorem for the shape functional under the ran- 
dom permutation model. Our results give rise to an apparently new family of 
distributions containing the Airy distribution (a = 1) and the normal distri- 
bution [case (b), and case (a) as a J. 0]. The main theoretical tools employed 
are recent results relating asymptotics of the generating functions of sequences 
to those of their Hadamard product, and the method of moments. 



1. Introduction 

Binary trees are fundamental data structures in computer science, with pri- 
mary application in searching and sorting. For background we refer the reader 
to Chapter 2 of the excellent book [T8] . In this article we consider additive func- 
tionals defined on uniformly distributed binary trees (sometimes called Catalan 
trees) induced by two types of toll sequences [(n a ) and (logn)]. (See the simple 
Definition 2.11 ) Our main results, Theorems 13.101 and 14.21 establish the limiting 
distribution for these induced functionals. 

A competing model of randomness for binary trees — one used for binary search 
trees — is the random permutation model (RPM); see Section 2.3 of [T8]. While there 
has been much study of additive functionals under the RPM (see, for example, [TSj, 
Section 3.3] and pH EE ECS Hj), little attention has been paid to the distribution of 
functionals defined on binary trees under the uniform (Catalan) model of random- 
ness. Fill [5] argued that the functional corresponding to the toll sequence (logn) 
serves as a crude measure of the "shape" of a binary tree, and explained how this 
functional arises in connection with the move-to-root self-organizing scheme for dy- 
namic maintenance of binary search trees. He derived a central limit theorem under 
the RPM, but obtained only asymptotic information about the mean and variance 
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under the Catalan model. (The latter results were rederived in the extension [19] 
from binary trees to simply generated rooted trees.) In this paper (Theorem |4.2j ) 
we show that there is again asymptotic normality under the Catalan model. 

In [TT1 Prop. 2] Flajolet and Steyaert gave order-of-growth information about 
the mean of functionals induced by tolls of the form n a . (The motivation is to 
build a "repertoire" of tolls from which the behavior of more complicated tolls can 
be deduced by combining elements from the repertoire. The corresponding results 
under the random permutation model were derived by Neininger [20].) Takacs 
established the limiting (Airy) distribution of path length in Catalan trees [231 EH 
|25] , which is the additive functional for the toll n — 1. The additive functional for 
the toll n 2 arises in the study of the Wiener index of the tree and has been analyzed 
by Janson [15] . In this paper (Theorem 13.101 ) we obtain the limiting distribution 
for Catalan trees for toll n a for any a > 0. The family of limiting distributions 
appears to be new. In most cases we have a description of the distribution only in 
terms of its moments, although other descriptions in terms of Brownian excursion, 
as for the Airy distribution and the limiting distribution for the Wiener index, may 
be possible. This is currently under investigation by the authors in collaboration 
with others. 

The uniform model on binary trees has also been used recently by Janson [14] in 
the analysis of an algorithm of Koda and Ruskey [17] for listing ideals in a forest 
poset. 

This paper serves as the first example of the application of recent results [6], 
extending singularity analysis [10], to obtain limiting distributions. In [6], it is 
shown how the asymptotics of generating functions of sequences relate to those 
of their Hadamard product. First moments for our problems were treated in [6] 
and a sketch of the technique we employ was presented there. (Our approach to 
obtaining asymptotics of Hadamard products of generating functions differs only 
marginally from the Zigzag Algorithm as presented in [6].) As will be evident soon, 
Hadamard products occur naturally when one is analyzing moments of additive 
tree functionals. The program we carry out allows a fairly mechanical derivation 
of the asymptotics of moments of each order, thereby facilitating application of the 
method of moments. Indeed, preliminary investigations suggest that the techniques 
we develop are likewise applicable to the wider class of simply generated trees; this 
is work in progress. 

The organization of this paper is as follows. Section [2] establishes notation and 
states certain preliminaries that will be used in the subsequent proofs. In Section [3] 
we consider the toll sequence (n Q ) for general a > 0. In Section [3. II we compute the 
asymptotics of the mean of the corresponding additive functional. In Section 13.2 
the analysis diverges slightly as the nature of asymptotics of the higher moments 
differs depending on the value of a. Section [3.31 employs singularity analysis [10] to 
derive the asymptotics of moments of each order. In Section [3T41 we use the results 
of Section 13.31 and the method of moments to derive the limiting distribution of 
the additive tree functional. In Section \4\ we employ the approach again to obtain 
a normal limit theorem for the shape functional. Finally, in Section \5\ we present 
heuristic arguments that may lead to the identification of toll sequences giving rise 
to a normal limit. 
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2. Notation and Preliminaries 

2.1. Additive tree functionals. We first establish some notation. Let T be a 
binary tree. We use \T\ to denote the number of nodes in T. Let L(T) and R(T) 
denote, respectively, the left and right subtrees rooted at the children of the root 
of T. 

Definition 2.1. A functional / on binary trees is called an additive tree functional 
if it satisfies the recurrence 

f(T) = f(L(T)) + f(R(T)) + b m , 

for any tree T with |T| > 1. Here (6 n )n>i is a given sequence, henceforth called 
the toll function. 

We analyze additive functionals defined on binary trees uniformly distributed 
over {T : \T\ = n} for given n. Let X n be such an additive functional induced by 
the toll sequence (&„). It is well known that the number of binary trees on n nodes 
is counted by the nth Catalan number 

n + 1 \ n 

with generating function 



CAT(z) :=J20nZ n = 5~(1- v 7 ! - ~z). 



2z 

In our subsequent analysis we will make use of the identity 

(2.1) z CAT 2 (z) = CAT(z) - 1. 

The mean of the cost function a n := EX„ can be obtained recursively by condi- 
tioning on the size of L(T) as 

(a.j-1 + a n -j) + b n , n>l. 

3=1 Pn 
This recurrence can be rewritten as 

n 

(2.2) (f3 n a n ) = 2^2(f3 j - 1 a j ^ 1 )f3 n ^ j + {(3 n b n ), n>l. 

Recall that the Hadamard product of two power series F and G, denoted by F{z) 
G(z), is the power series defined by 

(F & G)(z) ee F(z) G(z) := ]T f n g n z n , 

n 

where 

F(z) = fm n and G(z) = ^ g n z n . 

n n 

Multiplying ( |2.2p by z n /4 n and summing over n > 1 we get 

(2.3) ^)0CAT( z /4) = ^ ^, 

yl — z 

where A(z) and B(z) are the ordinary generating functions of (a n ) and (b n ) respec- 
tively. 
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Remark 2.2. Catalan numbers are ubiquitous in combinatorial applications; see [22] 
for a list of 66 instances and |http : / / www-math . mit . edu/~rstan/ ec/| for more. 

In the sequel the notation [• • •] is used both for Iverson's convention pJJ], 1.2.3(16)] 
and for the coefficient of certain terms in the succeeding expression. The interpre- 
tation will be clear from the context. For example, [a > 0] has the value 1 when 
a > and the value otherwise. In contrast, [z n ]F(z) denotes the coefficient of 
z n in the series expansion of F(z). Throughout this paper T and £ denote Euler's 
gamma function and the Riemann zeta function, respectively. 

2.2. Singularity analysis. Singularity analysis is a systematic complex-analytic 
technique that relates asymptotics of sequences to singularities of their generating 
functions. The applicability of singularity analysis rests on the technical condition 
of A- regularity. Here is the definition. See [6] or (10] for further background. 

Definition 2.3. A function defined by a Taylor series about the origin with radius 
of convergence equal to 1 is A-regular if it can be analytically continued in a domain 

A(</),7 ? ) :={z:\z\ < 1 + r h | arg(z - 1)| >(/>}, 

for some r/ > and < <j) < ir/2. A function / is said to admit a singular expansion 
at z = 1 if it is A-regular and 

f(z) = J2c J (l-zT>+0(\l-z\ A ) 

uniformly in z G A(<f),rj), for a sequence of complex numbers (cj)o<j<j and an 
increasing sequence of real numbers {cij)a<j<j satisfying otj < A. It is said to 
satisfy a singular expansion "with logarithmic terms" if, similarly, 

/(*) = £ Cj (L(z)) (1 - +OQ1- z\ A ), L(z) := log -U 
j=o z 

where each Cj(-) is a polynomial. 

Following established terminology, when a function has a singular expansion with 
logarithmic terms we shall say that it is amenable to singularity analysis. 
Recall the definition of the generalized polylogarithm: 

Definition 2.4. For a an arbitrary complex number and r a nonnegative integer, 
the generalized polylogarithm function Li Q r is defined for |z| < 1 by 

71=1 

The key property of the generalized polylogarithm that we will employ is 

Li Q r Li^ iS = Li a+( 3 jr+s . 

We will also make extensive use of the following consequences of the singular ex- 
pansion of the generalized polylogarithm. Neither this lemma nor the ones follow- 
ing make any claims about uniformity in a or r. Note that Lii o(z) = L(z) = 
log^l-^)- 1 ). 
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Lemma 2.5. For any real a < 1 and nonnegative integer r, we have the singular 
expansion 

r 

Li Q , r (z) = ]T A^ r) (l - zy- l I/-\z) + 0(\1 - zr e ) + (-irc (r) («)[« > 0], 

k=Q 

where X^" = (£)r( fc )(l — a) and e > is arbitrarily small. 
Proof. By Theorem 1 in [8], 

(2.4) Li ai o(z)~r(l-a)t a - 1 +^;t^C(«-jy, t = -logz = f;^-^ I 

j>0 J ' 1=1 

and for any positive integer r, 

Li air (z) = (-1)' -^-p Li afi (z). 

Moreover, as also shown in [8], the singular expansion for Li a:r is obtained by 
performing the indicated differentiation of (2.4) term-by-term. To establish the 
claim we set / = T(l — a) and g = t"^ 1 in the general formula for the rth derivative 
of a product: 



=£(;)/ 

to first obtain 



r \ fW g {r-k) 



The claim then follows easily. □ 

The following "inverse" of Lemma 12.5 is very useful for computing with Hada- 
mard products. 

Lemma 2.6. For any real a < 1 and nonnegative integer r, there exists a re- 
gion A((j),n) as in Defintion \2. 3\ such that 

r 

(1 - zT^L^z) = J2 ^" T) ^r-kiz) + OQ1- z\ a - f -) + cr(a)[a > 0] 

holds uniformly in z e A(cf),r]), where = l/r(l — a), c r (a) is a constant, and 

e > is arbitrarily small. 

Proof. We use induction on r. For r = we have 

Li afi (z) = T(l - a)(l - z) a ~ l + 0(]1 - z\ a - f -) + C(a)[a > 0] 
and the claim is verified with 

= s and c o(") = 



r(l-a) v ' r(l-a) 
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Let r > 1. Then using Lemma [2751 and the induction hypothesis we get 
Li Q ,r(z) 

= r(l-a)(l-z) Q - 1 L r (z) 



£4' 



(a,r) 



V-fc 



J] M, *'*"-* 5 Lia.r-fc-,^) + 0(11 " A a ^) + cr-k(a)[a > 0] 

k=l Ll=0 

+ 0(|1 - z|"- e ) + (-l) r C M (a)[« > 0] 

r r—k 

. (ct,r) V ^ (a,r— fc) 



= r(i - «)(i - z)-^^,) + ^ £ M r_r s fc) Lia,. w 

fe=l s=0 

+ 0(|1 - zl—) + ( £ A^' r) c r _ fe (a) + (-irC W (")) [« > 0] 



r-l 



T(l - a)(l - zY^Uiz) + £ ^ Li Q , s (z) 



+ 0(\1 - z\ a - e ) + 7 r (a)[a > 0] 
where, for < s < r — 1, 



v s ■— /^i k H'r-s-k ) 



fc=l 

and where 

r 

7r(«) := £ ***'V-k(a) + (-l) r C W («)- 
fe=i 

Setting 

(a,r) 1 r — k 1^7^ 

and 

/ \ 7r(«) 

Cr(a) = -f(i— 70' 

the result follows. □ 

For the calculation of the mean, the following refinement of a special case of 
Lemma 1 2. 5 1 is required. It is a simple consequence of Theorem 1 of [8]. 

Lemma 2.7. When a < 0, we have the singular expansion 

Li afi (z) = r(l-a)(l-z) Q - 1 -r(l-a)i^(l-z)« + 0(|l-zr +1 ) + C(a)[a > -1]. 

For the sake of completeness, we state a result of particular relevance from [6]. 
Theorem 2.8. Iff and g are amenable to singularity analysis and 

f(z) = 0(\l-z\ a ) and g(z) = 0(\1 - z\ b ) 
as z — > 1, then f © g is also amenable to singularity analysis. Furthermore 
(a) Ifa + b + KO then 

f(z)Qg(z) = 0(\l-z\ a + b + 1 ). 
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(b ) Ifk<a + b+ l<k + l for some integer ~1 < k < oo, then 

f(z) g{z) = t(/ © " z) j + OQ1 - z| a+b+1 ). 



3=0 " 

(c) If a + b + 1 is a nonnegative integer then 

a+b 

i\ 



f(z) g(z) = J2 [ -^(f © 9) U Hm ~ z) J + 0(\1 - z\ a+b+1 \L(z)\). 



3=0 3 



3. The toll sequence (n a ) 

In this section we consider additive functionals when the toll function b n is n a 
with a > 0. 

3.1. Asympotics of the mean. The main result of this Section [3TT1 is a singular 
expansion for A(z) © CAT(z/4). The result is ( 13T1) , (3T1) , or ( [375] ) according as 
a < 1/2, a = 1/2, or a > 1/2. 

Since &„ — n a , by definition _B = Li_ Q .o- Thus, by Lemma 12.71 

B{z) = T(l+a)(l-z)- a - 1 -r(l+a)^^{l-z)- a +0(\l-z\- a + 1 )+C(-^)[a < 1]. 

We will now use (|2.3| to obtain the asymptotics of the mean. 

First we treat the case a < 1/2. From the singular expansion CAT(z/4) = 
2 + 0(|1 - z] 1 / 2 ) as z -> 1, we have, by part (E) of Theorem [231 

B(z) CAT(z/4) = C + 0(\1 - z\- a+ \), 

where 



Co := CAT(z/4) ^ = £ 



n=l 

We now already know the constant term in the singular expansion of B(z) 
CAT(z/4) at z = 1 and henceforth we need only compute lower-order terms. The 
constant c is used in the sequel to denote an unspecified (possibly 0) constant, 
possibly different at each appearance. 

Let's write B(z) = L\{z) + Ri(z), and CAT(z/4) = L 2 {z) + R 2 (z), where 

Li(*) := r(l + a)(l - z)-"- 1 - r(l + a)^y^(l - z)- a + C(-a), 

iii(z) :=B(z)-L 1 (z) = 0(|l-z| 1 ^), 
L 2 (z) :=2(l-(l-z) 1 /2), 
i? 2 (z) := CAT(z/4) - L 2 (z) = 0(|1 - z|). 
We will analyze each of the four Hadamard products separately. First, 
L^z) L 2 (z) = -2r(l + a)[(l - z)-"- 1 (1 - z) 1 / 2 ] 

+ 2r(l + a)^Ji [(1 - z)-« (1 - z) 1 / 2 ] + c. 
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By Theorem 4.1 of [6], 

(1 - z)- a ~ x (1 - z) 1 ' 2 =c+ - r( °~^ 1/0 , (l - zY a+ ^ + 0(\1 - z\), 



r(a + i)r(-i/2) 



and 



(l-z)-«0(l-z) 1 /2 =c + 0(\\-z\) 
by another application of part fb]) of Theorem I2.8I this time with k = 1. Hence 



+ T(a 2 ) _ a+ | 

z=l \/7T 



L 1 (z)0L 2 (z)= [L!(z)0L 2 (z)] 

The other three Hadamard products are easily handled as 
L x (z) R 2 (z) = [L^z) R 2 {z)\ +0(]1- 

L 2 (z) ife(z) - [L 2 (z) R^z)] + 0(11 - 

21=1 

i?i(z) ife(z) = [ife(z) i? 2 (z)] 1 + 0(|1 - z|) 
Putting everything together, we get 

B(z) CAT(z/4) = Co + 

Using this in ( |2.3j ), we get 



z=l 



r(Q ^^ (l - z)- a+ \ +0(|1- z|~ Q+1 ). 



(3.1) A(z) CAT(z/4) - C (l - z)" 1 / 2 + 



(l-z)-« + 0(|l-z| 



-Ct+7 



(3.2) 



To treat the case a > 1/2 we make use of the estimate 

1 



r(-i/2) 

a consequence of Theorem 1 of [8] , so that 



[Li 3 / 2 ,oW-C(3/2)] + 0(|l-z|), 



B(z) (1 - z) 1/2 = Li_ a , (z) (1 - z) 1 / 2 = 



1 



Li; 



r(-i/2) 3-a,0 



where 

(3.3) 

Hence 



c + Odl-zl 1 -") l/2<a<l 
/?(:) = ^ 0(|i(^)|) a=l 
0(|l-z|^ Q ) a > 1. 



£(z) CAT(z/4) = -- 



Li 3 



r(-i/2) f-«.o v 

where i?, like i?, satisfies (" 13.31) (with a possibly different c). When a = 1/2, this 
gives us 



B{z) CAT(z/4) = - 



r(-i/2) 



L(z) + c+0(\l-z\ 1 / 2 ), 



so that 



(3.4) A{z) CAT(z/4) = -L(l - z)~ 1/2 L{z) + c(l - z)' 1 ' 2 + 0(1). 
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For a > 1/2 another singular expansion leads to the conclusion that 

(3.5) A{z) CAT(z/4) = ^ (1 - z y a + R{z), 
where 

{0{\l-z\~2) l/2<a<l 
0{\1- z\-l\L{z)\) a=l 
0(\l-z\- a+ ^) a>l. 
We defer deriving the asymptotics of a n until Sections I3.2-H3.3l 

3.2. Higher moments. We will analyze separately the cases < a < 1/2, a = 1/2, 
and a > 1/2. The reason for this will become evident soon; though the technique 
used to derive the asymptotics is induction in each case, the induction hypothesis 
is different for each of these cases. 

3.2.1. Small toll functions (0 < a < 1/2). We start by restricting ourselves to tolls 
of the form n a where < a < 1/2. In this case we observe that by singularity 
analysis applied to ( 13. lj ), 

^ = ^ n -V2 + ( n -3/2 } + ofn"- 1 ) = %n- x ^ + O^"" 1 ), 

4™ y/TT y/TT 

so 

a n = nl [1 + Oin-^lCon-h + 0(n o_1 )] = C n + 0{n a+ h) = (C + o(l))(n + 1). 

The lead-order term of the mean a n — E X„ is thus linear, irrespective of the value 
of0<a<l/2 (though the coefficient Cq does depend on a). We next perform an 
approximate centering to get to further dependence on a. 

Define X n := X n - C (n + 1), with X Q := 0; p, n (k) := EI*, with p, n (Q) = 1 
for all n > 0; and /x„(fc) := /3„/x„(/c)/4™. Let M^(z) denote the ordinary generating 
function of /t n (fc) in the argument n. 

By an argument similar to the one that led to ( j2.2| ), we get, for k > 2, 

Vn(k) = -^^^fi 3 ~i(k) +r n (k), n>l, 

3=1 

where 

1 " / fc \ 

- iE E U,fc 2 , W /Vl( " l)A ^' (fc2)6 " 3 

j = l k!+k 2 +k 3 =k V ^ ' V 

ki,k 2 <k 

1 / fc \ " 

= i E U,fc 2 ,J 5 " 3EAj - l(fcl)A ^ (fc2) ' 

fei+fe 2 +fc 3 =fc v ' ' 7 i=i 
fci,fe 2 <fe 

for n > 1 and fo(k) := /to(fc) = /io(fc) = (— l) fe Co'. Let Rk(z) denote the ordinary 
generating function of f n (k) in the argument n. Then, mimicking ( 12.3) , 

(3.6) M k (z) - Rk{z) 



vT 



10 



JAMES ALLEN FILL AND NEVIN KAPUR 



ith 

.ki,k 2 ,k 3 J y L 4 



(3.7) R k (z) = (-l) k C k + Y, L t u )(B(zr k3 )Q[jM kl (z)M k2 (z)]. 



k 1 +k 2 +k 3 =k 
ki,k 2 <k 

where for k a nonnegative integer 

B(z) Qk := B{z) Q ■ ■ ■ Q B{z) . 

k 

Note that M (z) = CAT(z/4). 

Proposition 3.1. Let e > be arbitrary, and define 

[2a -e 0<a<l/4 
C \ 1/2 1/4 < a < 1/2. 

Then we have the singular expansion 

M k (z) = C fc (l - z)' t(a+ 5> + l + 0(|1 - zr fc(a+ 5)+i+ c ), 

TTie Cfe 's here are defined by the recurrence 
(3.8) 

4 ^Vj/ r((fc-i)a + f -l) 

Proof. For k = 1 the claim is true as shown in ( j3.1| ) with Ci as defined in ( |3.8| ). 
We will now analyze each term in ( ]3.7f ) for k > 2. 

One can analyze separately the cases < a < 1/4 and 1/4 < a < 1/2. The proof 
technique in either case is induction. We shall treat here the case < a < 1/4; the 
details in the other case can be found in [7] . 

For notational convenience, define a' :— a + h. Also, observe that 

B(zf k = Li. ka , (z) = T(l + ka)(l - z)-^- 1 +0(\1- z\- ka - £ ) 

by Lemma 12.51 We shall find that the dominant terms in the sum in ( 13.7f t are those 
with (i) k 3 = 0, (ii) (ki,k 2 ,h) = (k- 1,1,0), and (iii) (ki,k 2 ,k a ) = (0,fc- 1,1). 

For this paragraph, consider the case that k\ and k 2 are both nonzero. It follows 
from the induction hypothesis that 

Z -M kl {z)M k2 {z) = 1(1 - (1 - z)) [C kl (1 - zy k ^' + l +0(|1- ^-fea'+i+Ca-e))] 
x [C k2 (l - z)- k2a '+l + 0(\l - z |-*»«'+3+( 2o - e ))] 

= ic , fel C fe2 (l-z)- (fcl+fe2)a ' + 1 +0(\1 _ z |-(fci+fc2)a' + l+(2a- 6 )^ 

If &3 = then the corresponding contribution to R k (z) is 



iQJc fel C fc2 (l - z)- ka ' +1 + 0(\1 



LIMITING DISTRIBUTIONS FOR ADDITIVE FUNCTIONALS ON CATALAN TREES 11 



If &3 7^ we use Lemma 12.61 to express 

Z -M kl (z)M k2 (z) = 4 r((ib 1 C +^ a /-i) Li -^+*»)°'+^W 
+ 0(|1- ^-(^K+i+P-*)) _ + fe 2 )a' < 2] ^g^g±gl . 

The corresponding contribution to Rk(z) is then ( fc ^ fc ) times: 

4T((jfei + fc 2 )a' - 1) -k a >+f+2,o K >^ bQ ' 01 ' Vl 1 ' 
Now &3 < k — 2 so — ka' + ^ + 2 < 1. Hence the contribution when &3 7^ is 

0(|1 - zr feQ ' + ^ +1 ) = 0(|1 - z\- ka ' + l) = 0(|1 - 2 |-fe«'+i+(2«-0). 

Next we consider the case when k\ is nonzero but fc 2 = 0. In this case using the 
induction hypothesis we see that 

~M kl (z)M k2 (z) = ^ CAT (z/A)M kl (z) 

= 1 - (1 ~ Z)V2 [O fel (l - z)- kia ' + l] +0{\1- z\~ kia ' + ^ +(2a ~ €) ) 
= _ z )- fe i Q '+I +0(]1 - z |- fc i«'+5+( 2 «- e )). 

Applying Lemma |2.6 to the last expression we get 

-M kl {z)M k2 (z) = - ^Li 3 (2) 

+ 0(|1- z |-W + ± + <2a- e)) _ ^l[ fcl a' - 1 < 

The contribution to R k (z) is hence ( fc ) times: 

. tt- Li fe3 3 (z) + LL fe3QO (z)0O(|l-zr fcia ' + 5+( 2a - e )). 

2r(fc x a'-i) -to'+f+|,o w fc3Q ' uv ; Vl 1 y 

Using the fact that a > and ^3 < fc — 1, we conclude that — ka' + + | < 1 so 
that, by Lemma [2T51 and part (jaj) of Theorem 2.81 the contribution is 

o(|i - z \- ka ' +! f + h) = o(|i - zr fett '+i) 

where the displayed equality holds unless fc 3 = 1. When k 3 = 1 we get a corre- 
sponding contribution to R k (z) of times: 

(1 - ' r '°' + ' + 0(11 - -r'°' +1+p "">. 

since for fc > 2 we have £;«' > 1 + (2a — e). The introduction of e handles the 
case when ka' = 1 + 2a, which would have otherwise, according to part (jcj) of 
Thoerem 12.81 introduced a logarithmic remainder. In either case the remainder 
is 0(|1 — z\~ ka + 1 +( 2Q - e )). The case when fc 2 is nonzero but k\ — is handled 
similarly by exchanging the roles of k\ and fc 2 . 
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The final contribution comes from the single term where both k\ and k 2 are zero. 
In this case the contribution to R k (z) is, recalling ( 12. lj ) , 
(3.9) 

Li_ feQ , (z)©[JCAT 2 (z/4)] = LL fcQ , o (z)0(CAT(z/4)-l) = Li_ fcQ , (z)©CAT(z/4). 

Now, using Theorem 1 of [8], 

CAT(z/4) = 2 - 2(1 - z) 1 !' 2 + 0(|1 - z\) 



= 2 + 2 



C(3/2) 2 
T{-l/2) T(-l/2) 



Li 3 / 2 ,oW + 0(|l-z|), 



so that (3J) is 
2 



Li; 



r(-i/2) 5-fc«,o 



1 - fca < 0, 

(:)+()( I - :i^ fea ) + <| 0(|l-z|~ £ ) l-fca = 0, 

0(1) 1 - fca > 0. 



When § - fca < 1 this is 0(|1 - z\~ ka+ 2); when | - fea > 1, it is 0(1). In either 
case we get a contribution which is 0(|1 — z\~ ka +!+( 2a - e )). 
Hence 



Rk(z) 



.k\+k 2 =k 
fei,fe 2 <fc 



fc\C fcl C fc2 , O fe _! r(fca+|-l) 



2fc 



2 r((fc - l)a + f - 1) 



(1-z) 



-jfea'+l 



+ 0(|1 _ z |-fc"' + l+(2a- 6 )) 

= C fc (l - z)- fea ' +1 + 0(|1 - z |-fc"'+i+(aa- e )j j 

with the Cfc's defined by the recurrence ( ]3.8j) . Now using (|3.6| ), the claim follows. 

□ 

3.2.2. Large toll functions (a > 1/2/ When a > 1/2 there is no need to apply the 
centering techinqucs. Define ^ n (k) :— EI* and f2 n (k) := /9 n /i„(fc)/4 n . Let Mk(z) 
denote the ordinary generating function of p, n (k) in n. Observe that Mq(z) = 
CAT(z/4). As earlier, conditioning on the key stored at the root, we get, for k > 2, 



3=1 



where 



E 



fcl+fe2 + fe3=fe 

fci ,k 2 <k 



k 

ki,k 2 ,k 3 



3=1 



for n > 1 and ro(fc) := /ito(fc) = A*o(^) = 0. Let R k (z) denote the ordinary generat- 
ing function of f n (k) in n. Then 

Rk(z) 



M k {z) 



VT^z 



and 

(3.10) R k (z) 



E 



fcl+fc2 + fc3 = fe 

k±,k 2 <k 



k 

ki,k 2 ,k 3 



(B(z)^)Q[-M kl (z)M k2 (z)]. 
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We can now state the result about the asymptotics of the generating function M k 
when a > 1/2. The case a = 1/2 will be handled subsequently, in Proposition 13.31 

Proposition 3.2. Let e > be arbitrary, and define 
(3.11) c:= 




Then the generating function Mk(z) o//i„(fc) has the singular expansion 

M k (z) = C fc (l - z)- k{a+ \ )+ \ + 0(|1 - z \-^ a +b+^+ c ) 
for k > 1, where the C k 's are defined by the recurrence (J3.8J) . 



Proof. The proof is very similar to that of Proposition 13.11 We present a sketch. 
The reader is invited to compare the cases enumerated below to those in the earlier 
proof. 

When k = 1 the claim is true by (3.51 ). We analyze the various terms in (3.10| ) 
for k > 2, employing the notational convenience a' := a + h. 

When both k\ and &2 are nonzero then the contribution to Rk{z) is 



\^C kl C k2 {l-z)- ka ' +l +0{\l-z\- 



ka +c+l 



when &3 = and is 0(|1 — z\ ka + c+1 ) otherwise. 

When k\ is nonzero and ki = the contribution to Rk{z) is 

Cfe -ir(W-i) _ + oni - 

2T((A- l)a'- i) V y 

when fes = l and 0(|1 — z| _fca + c+1 ) otherwise. The case when fc2 is nonzero and 
k\ = is identical. 

The final contribution comes from the single term when both k\ and &2 are zero. 

In this case we get a contribution of OQl - zp feQ+ 2) which is 0(|1 - z\~ ka ' +c+1 ). 
Adding all these contributions yields the desired result. □ 

The result when a = 1/2 is as follows. Recall that L(z) := log((l — z)^ 1 ). 
Proposition 3.3. Let a = 1/2. In the notation of Proposition 3. 2, 

k 

M k (z) = (1 - zr k+ ^J2c k ,iL k - l (z) + 0(\1 - z\- k+1 ^) 

1=0 

for k > 1 and any e > 0, where the C k ,i 's are constants. The constant multiplying 
the lead- order term is given by 

(3.12, C t0 = < 2 *- 2 >' 



22fc-2(fc-l)!7r*/2" 

Proof. We omit the proof, referring the interested reader to [7] . □ 
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3.3. Asymptotics of moments. For < a < 1/2, we have seen in Proposition [371] 
that the generating function Mfc(z) of fi n (k) = (3 n fi, n (k)/A n has the singular expan- 
sion 

M k (z) =C k (l- z)- k{a+ h+l +OQ1- z|- fe ( tt +5)+3+ c ), 
where c := min{2a — e, 1/2}. By singularity analysis [10], 

&£n(*0 _ r ^ a + h~l k(a+ 



Recall that 



4™ 



so that 

(3.13) Mk) = —-^—n k ^ + 0{n k ^- c ). 

r(fe(a + i) - i) 

For a > 1/2 a similar analysis using Proposition |3.2| yields 

(3.14) n n {k) = + 0(n fc ("+5)- c ), 

r(fc(a + i) - 

with now c as defined at ( 13.11) . Finally, when a = 1/2 the asymptotics of the 
moments are given by 

(3.15) fin{k) = (-j=\ (nlogn) k + 0(n k (logn) k - 1 ). 



3.4. The limiting distributions. In Section 13.4.1 we will use our moment esti- 
mates ( 13.131) and (|3.14| ) with the method of moments to derive limiting distributions 
for our additive functions. The case a = 1/2 requires a somewhat delicate analysis, 
which we will present separately in Section 13.4.21 

3.4.1. a 7^ 1/2. We first handle the case < a < 1/2. (We assume this restriction 
until just before Proposition 13.51 ) We have 

(3.16) A„(l) = E X n = E [X n - C (n + 1)] = §^n Q+ I + 0{n a+ \- c ) 
with c := min{2a — e, 1/2} and 

r. fO\ V T?2 Ci\pK 2a+l . r^i 2a+l-c\ 

Mn(2) = EX " = iX2^T|) n +0{n ] - 

So 

(3.17) VarX„ = Var X n = fl n (2) - [A«(l)] 2 = <J 2 n 2a+1 + 0(n 2a+1 - c ), 
where 

(3 - 18) a = n^TY) fw 

We also have, for k > 1, 



(3.19) E 



n 2 



n M«+|) r(M« + i)-i) + 1 j - 
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The following lemma provides a sufficient bound on the moments facilitating the 
use of the method of moments. 

Lemma 3.4. Define a' := a+ \ . There exists a constant A < oo depending only 
on a such that 



fc! 



< A k k a ' h 



for all k > 1. 



Proof. The proof is fairly similar to those of Propositions 13.11 13.2 and Proposi- 
tion \4A\ We omit the details, referring the reader to [7]. □ 

It follows from Lemma 13.41 and Stirling's approximation that 



(3.20) 



k\T(k(a + i) - ±) 



< B k 



for large enough B depending only on a. Using standard arguments [1, Theorem 
30.1] it follows that X n suitably normalized has a limiting distribution that is 
characterized by its moments. Before we state the result, we observe that the 
argument presented above can be adapted with minor modifications to treat the 
case a > 1/2, with X n replaced by X n . We can now state a result for a ^ 1/2. We 

will use the notation — > to denote convergence in law (or distribution). 

Proposition 3.5. Let X n denote the additive functional on Catalan trees induced 
by the toll sequence {n a ) n >$. Define the random variable Y n as follows: 

X n -C Q (n+l) 



Y — ( v 



n 2 



< a < 1/2, 
a > 1/2, 



where 



Then 



oo 



c o : = > n — , f) n = 



^ 4™ n + 1 V n 

n—0 



Y A Y- 



here Y is a random variable with the unique distribution whose moments are 

(3 - 21) Eyfc = m7%^ -Y 

r(fc(«+ 2) - 2> 

where the Ck 's satisfy the recurrence 

1 fk\ r(fca+|-l) r(a-i) 

The case a = 1/2 is handled in Section [3.4.21 leading to Proposition 13.81 and a 
unified result for all cases is stated as Theorem 13.101 

Remark 3.6. We now consider some properties of the limiting random variable 
Y = Y(a) defined by its moments at ( I3.21D for 1/2. 



16 



JAMES ALLEN FILL AND NEVIN KAPUR 



(a) When a = 1, setting Qfc := Cfc/2 we see immediately that 

Eyfe= -r(-i/2) fl 
r((3fe-i)/2) 



where 

2fij k = 5^pjn i n fc _ J - + A(3A-4)n fc _i, n x 
j=i w 

Thus y has the ubiquitous Airy distribution and we have recovered the limiting 
distribution of path length in Catalan trees [23, 25]. The Airy distribution arises 
in many contexts including parking allocations, hashing tables, trees, discrete 
random walks, mergesorting, etc. — see, for example, the introduction of [9] 
which contains numerous references to the Airy distribution, 
(b) When a = 2, setting rj := Y/y/2 and a ,i := 2 2l ~ 1 Ci, we see that 



where 



p i i_ 

^ ~ 2 (5<-2)/2r((5Z-l)/2) a °'' 



a cM = \ S P J \- ) a o,jaoj-j + 1(51 - 4) (51 - 6), a ,i = 1. 



2 ^ 

3=1 



We have thus recovered the recurrence for the moments of the distribution 
C(rj), which arises in the study of the Wiener index of Catalan trees pj>l proof 
of Theorem 3.3 in Section 5]. 
(c) Consider the variance a 2 defined at ( ]3.18j ). 

(i) Figure [3TT1 plotted using Mathematica, suggests that a 2 is positive for all 
a > 0. We will prove this fact in Theorem 13.101 There is also numerical 




Figure 3.1. a 2 of ( 13.18j ) as a function of a. 



evidence that a 2 is unimodal with max Q cr 2 (a) = 0.198946 achieved at 
a = 0.682607. (Here = denotes approximate equality.) 
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(ii) As a — > oo, using Stirling's approximation one can show that a 2 ~ (y/2 — 
l)a- 1 . 

(iii) As a J, 0, using a Laurent series expansion of T(a) we see that a 2 ~ 
4(l-log2)a. 

(iv) Though the random variable has been defined only for a 7^ 1/2, the 
variance a 2 has a limit at a = 1/2: 

(3.22) lim a 2 (a) = ^1_1. 

(d) Figure [3721 shows the third central moment E [Y — E Y"] 3 as a function of a. The 
plot suggests that the third central moment is positive for each a > 0, which 
would also establish that Y(a) is not normal for any a > 0. However we do 
not know a proof of this positive skewness. [Of course, the law of Y(a) is not 
normal for any a > 1/2, since its support is a subset of [0, 00).] 




Figure 3.2. E [Y - Eff of Proposition [375] as a function of a. 

(e) When a = 0, the additive functional with toll sequence (n a = l) n >i is n for all 
trees with n nodes. However, if one considers the random variable a -1 / 2 y(a) 
as a J, 0, using ( j 3 . 2 1 j) and induction one can show that a~ 1 ^ 2 Y(a) converges in 
distribution to the normal distribution with mean and variance 4(1 — log 2). 

(f) Finally, if one considers the random variable a l / 2 Y(a) as a — * 00, again us- 
ing ( 13.21 [ ) and induction we find that a^ 2 Y(a) converges in distribution to the 
unique distribution with fcth moment \fk\ for k = 1,2, . . .. In Remark 1 3 . 7 1 next , 
we will show that the limiting distribution has a bounded, infinitely smooth 
density on (0,oo). 

Remark 3.7. Let Y be the unique distribution whose fcth moment is \f~k\ for k = 
1,2,.. .. Taking Y* to be an independent copy of Y and defining X :— YY* , we see 
immediately that X is Exponential with unit mean. It follows by taking logarithms 
that the distribution of logl^ is a convolution square root of the distribution of 
logX. In particular, the characteristic function (j> of logY" has square equal to 
r(l + it) at t S (— 00,00); we note in passing that T(l + it) is the characteristic 
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function of — G, where G has the Gumbel distribution. By exponential decay of 
r(l + it) as t — » ±00 and standard theory (see, e.g., [4| Chapter XV]), log Y" has an 
infinitely smooth density on (—00,00), and the density and each of its derivatives 
are bounded. 

So Y has an infinitely smooth density on (0, 00). By change of variables, the 
density fy of Y satisfies 

/io g y (logy) 



Mv) 



y 



Clearly fy(y) is bounded for y not near 0. (We shall drop further consideration of 
derivatives.) To determine the behavior near 0, we need to know the behavior of 
/logy 0-°&y)/y as V ~ y 0. Using the Fourier inversion formula, we may equivalently 
study 

1 r°° 

e x ho S Y(-x) = —J e^ x 4>(t)dt, 

as x — > 00. By an application of the method of steepest descents [(7.2.11) in [2], 
with go = 1, P = 1/2, w the identity map, zq — 0, and a = 0], we get 

v /7rlog(l/y) 

Hence fy is bounded everywhere. 

Using the Cauchy integral formula and simple estimates, it is easy to show that 

Mv) = o{er My ) asy->oo 

for any M < 00. Computations using the WKB method |T2] suggest 

(3.23) / Y ( 2/ )~(2/ 7 r) 1 /V /2 exp(-y 2 /2) as y -> 00, 

in agreement with numerical calculations using Mathematica. [In fact, the right- 
side of (3.23) appears to be a highly accurate approximation to fy(y) for all y > 1.] 
Figure [3731 depicts the salient features of fy. In particular, note the steep descent 
of fy (y) to as y j and the quasi-Gaussian tail. 




FIGURE 3.3. fy of Remark 
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3.4.2. a = 1/2. For a = 1/2, from ( 13.151 ) we see immediately that 



E 



An 



nlogrt 



/ 1 



V log n 



Thus the random variable X n /(n\ogn) converges in distribution to the degenerate 
random variable 1 / y/n. To get a nondegenerate distribution, we carry out an 
analysis similar to the one that led to ( 13.4) , getting more precise asymptotics for 
the mean of X n . The refinement of (13.4|) that we need is the following, whose proof 
we omit: 

A(z) CAT(z/4) = - z)~ x ' 2 L(z) + D (l - z)' 1 ' 2 + 0(\1 - z\^), 

where 



(3.24) 



oo 1 

D = W/2[4-«/3„- -L , 



-3/2i 



By singularity analysis this leads to 

1 



(3.25) 
where 
(3.26) 



E X n 



D 1 



nlogn + Din + 0(n £ ), 



= (21og2 + 7 + V7rDo 



Now analyzing the random variable X n — tt 1 / 2 nlogn in a manner similar to that 
of Section 13.2.11 we obtain 



(3.27) 



VarfX„ -7r" 1/2 nloe 



-log 2- ^ ) n 2 + 0(n2 +£ ). 

7T 2 



Using ([3T25]) and ( I3T2TT) we conclude that 

X n — n^ 1 / 2 nlogn — Din 



E 



n 



o(l) 



and 
(3.28) 



Var 



X n — 7r x l 2 n\ogn~Din 



— log 2 = lim a (a), 

7T 6 2 c-1/2 ^ ' 



where cr 2 = cr 2 (a) is defined at ( 13T8]) for a ^ 1/2. [Recall {332) of Remark ELS] 

It is possible to carry out a program similar to that of Section 13.21 to derive 
asymptotics of higher order moments using singularity analysis. However we choose 
to sidestep this arduous, albeit mechanical, computation. Instead we will derive 
the asymptotics of higher moments using a somewhat more direct approach akin to 
the one employed in [5] . The approach involves approximation of sums by Riemann 
integrals. To that end, define 
(3.29) 

X n :=X n -7r- 1 / 2 (n + l)\og(n + l)-Di(n+l), and £„(fc) := ^EI*. 
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Note that X Q = -Dx, £ n (0) = /3„/4™ +1 , and fi (k) = (-L>i) fe /4. Then, in a now 
familiar manner, for n > 1 we find 



i=i 

where now we define 



( k \ n 
f n (k) := Y, { kuk2 , k J 2>-i(*0/W-j(**) 



/ci+fe 2 +fc 3 =/c X 3=1 
fel,fc2<fe 



-p(j lo SJ + (« + 1 - i) log(n + 1 - j) - (n + 1) Iog(n + 1) + ^n 1 ' 2 ) 
Passing to generating functions and then back to sequences one gets, for n > 0, 

3=0 



f ln (k) = J2(j + ^r n ^(k). 



Using induction on k, we can approximate f n {k) and fi n (k) above by integrals and 
obtain the following result. We omit the proof, leaving it as an exercise for the 
ambitious reader. 

Proposition 3.8. Let X n be the additive functional induced by the toll sequence 
("- 1 ^ 2 )n>i on Catalan trees. Define X n as in ( 1 3 . 2 9 D . with D\ defined at ( j3.26j ) and 
D at ( 13T24I ). Then 

E [X n /n] k = ink + o(l) asn-> oo, 
where too = 1, m i — 0, and, for k > 2, 

1 r(*-i) 



(3.30) m k = 



A^T{k-\) 

k 3 



(i i ) m fei m fe2 ( -4=) -h,MM +4\/nkm k -i 

fci ,k 2 <k 

where 

JkiMM ■= [ x kl -l{l-x) k2 -^[x\ogx+{l-x)\og{l-x)p dx. 
Jo 

Furthermore X n /(n +1) — * Y . where Y is a random variable with the unique 
distribution whose moments are EY k = m/., k > 0. 

3.4.3. A unified result. The approach outlined in the preceding section can also be 
used for the case a / 1/2. For completeness, we state the result for that case here 
(without proof). 
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Proposition 3.9. Let X n be the additive functional induced by the toll sequence 
{n a ) n >i on Catalan trees. Let a' := a + h. Define X n as 



(3.31) 



where 



X n '.— 



c (n + i; 

r(g-i) 
T(a) 



r( " ^(n + !)<*' 0<a<l/2, 



T(a) 
(n + l) a ' 



a > 1/2, 



n=l 



4" ' 



Then, for k = 0, 1, 2, 



E 



X„/n a 



m/j + o(l) as n —> oo, 



where mo = 1, mi = 0, and, for k > 2, 
1 T(ka' - 1) 



(3.32) m k 



A^T(ka' - |) 



E 



with 



k!+k 2 +k 3 =k 
k lt k 2 <k 



Jk 1 ,k 2 ,k 3 



k 

ki,k 2 ,k 3 



r(a-|) 

r(a) 



Jkx,k 2 ,k 3 +4 v y 7rfcmfe_i 



C fc l«'-|(l _ x)^"'-!^' + (1 - £)«' - l] fc 3 da-. 



Furthermore, X n /n a — > Y", where Y is a random variable with the unique distri- 
bution whose moments are E Y k = mk ■ 

[The reader may wonder as to why we have chosen to state Proposition [379] using 
several instances of n+ 1, rather than u, in ( 13-311 ") - The reason is that use of n + 1 
is somewhat more natural in the calculations that establish the proposition.] 

In light of Propositions 13.51 13.81 and 13.91 there are a variety of ways to state a 
unified result. We state one such version here. 

Theorem 3.10. Let X n denote the additive functional induced by the toll sequence 
{n a ) n >i on Catalan trees. Then 

X n — E X n c 



y/VarX n 

where the distribution of W is described as follows: 



(a) For a ^ 1/2, 



Y 



r(a) 



with a~ 



C 2 ^ C\v 



> 0, 



T(2a+i) T 2 (a) 

where Y is a random variable with the unique distribution whose moments are 



Er 



and the C k 's satisfy the recurrence |3 
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(b) Fora = 1/2, 



Y , 8 7T 

W = — , with a 2 := -log 2 - -, 
a 7r 2 



where Y is a random variable with the unique distribution whose moments rrik 
Ey fc are given by ( 13301 ). 

Proof. Define 

X n — E X n 



\/VarX„ 



(a) Consider first the case a < 1/2 and let a' := a + ~. By ( 13.161 ), 

(3.33) EX n = C {n + l) + ^^n a ' +o{n a '). 

Since X n defined at ( 13.311 ) and X n differ by a deterministic amount, Var X n — 
Var X n . Now by Proposition 13.91 

(3.34) _ 

VarX„ =EI^-(El n ) 2 - (m 2 + o(l))n 2a ' - (m 2 + o(l))n 2a ' = (m 2 + o{l))n 2a ' . 



So it equals TO2 defined at ( 13.321 ), namely, 



1 r(2a'-l) (T{a-\Y 1 



4^n2a—){-naT) J ^ 2 

Thus to show a 2 > it is enough to show that Jo, 0,2 > 0. But 

Jo,o,2 = f x~ 3/2 (l - x)^ 2 [x a ' + (1 - - l] 2 dx, 







which is clearly positive. Using (3.33) and (13.341 ), 

X n - C {n + 1) - n Q ' + o(n a ') 



(I + o{l))an a ' 



so, by Proposition 13.51 and Slutsky's theorem [U Theorem 25.4], the claim follows. 
The case a > 1/2 follows similarly. 



E X n — -^=n log n + D\n + o(n) 

\/7T 



(b) When a =1/2, 



by ( 13^25) and 



by ( j3.28j) . The claim then follows easily from Proposition |3.8| and Slutsky's theorem. 

□ 



VarX n = ( -log2- £ + o(l) ) n 2 
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4. The shape functional 

We now turn our attention to the shape functional for Catalan trees. The shape 
functional is the cost induced by the toll function b n = logn, n > 1. For background 
and results on the shape functional, we refer the reader to [5] and [19] . 

In the sequel we will improve on the mean and variance estimates obtained in [5] 
and derive a central limit theorem for the shape functional for Catalan trees. The 
technique employed is singularity analysis followed by the method of moments. 

4.1. Mean. We use the notation and techniques of Section 1 3. II again. Observe that 
now B(z) — Lio,i(-z) and (J3.2j) gives the singular expansion 

CAT(z/4) = 2 - 2 2) [Li 3/2 ,o(*) - C(3/2)] 



K 1_ rFi§)) (i-^) + o(|i-^l 3/2 ). 



So 



B(z) CAT(z/4) = - - — Li 3/2;1 (z) + c + 5(1 - z) + 0(\1 - z|2" e ), 



r(-i/2) 

where c and c denote unspecified (possibly 0) constants. The constant term in the 
singular expansion of B(z) CAT(z/4) is already known to be 

oo 

C o =B(z)0CAT(z/4) _ = X>gn)g. 

Z — 1 4 

n—1 

Now using the singular expansion of Li 3 / 2 ,i(^), we get 

B(z)0CAT(z/4) = c7 o -2(l-z) 1 / 2 L(z)-2(2(l-log(2))- 7 )(l-z) 1 / 2 + O(|l-z|), 
so that 

(4.1) A(z)0CAT(z/4) = C7 (l-z)- 1/2 -2 J L(z)-2(2(l-log2)- 7 ) + O(|l-z| 1 / 2 ). 

Using singularity analysis and the asymptotics of the Catalan numbers we get that 
the mean a n of the shape functional is given by 

(4.2) a n = Cb(n + l)-2vW /2 + 0(l), 

which agrees with the estimate in Theorem 3.1 of [5] and improves the remainder 
estimate. 

4.2. Second moment and variance. We now derive the asymptotics of the ap- 
proximately centered second moment and the variance of the shape functional. 
These estimates will serve as the basis for the induction to follow. We will use the 
notation of Section [3.2.11 centering the cost function as before by Co(n + 1). 
It is clear from ( j4.ll ) that 

(4.3) M x {z) = -2L(z) - 2(2(1 - log 2) - 7) + 0(|1 - z^ 2 ), 

and ( j3.7j ) with k = 2 gives us, recalling ( j2.1j ). 

(4.4) _ 

R 2 {z) = Cl + CAT(z/4) Li , 2 (z) + 4Li 04 (z) [~ CAT (z/ 4) M^z)] + Z -Ml(z). 
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We analyze each of the terms in this sum. For the last term, observe that z/2 — ► 1/2 
as z — ► 1, so that 

|M?(z) = 2L 2 (z) + 4(2(1 - log 2) - i)L{z) + 2(2(1 - log2) - 7 ) 2 + 0(\1 - z\l~% 

the e introduced to avoid logarithmic remainders. The first term is easily seen to 
be 

CAT(z/4) Li 0j2 (z) = K + 0(\1 - zfi"), 

where 

For the middle term, first observe that 

| CAT(z/4)M!(z) = -£(*) - (2(1 - log 2) - 7 ) + (1 - zf' 2 L{z) + 0(\1 - z^ 2 ) 
and that L(z) = Lii.o(^). Thus the third term on the right in Q4.4j ) is 4 times: 
-U hl (z) + C + OQ1 - z\\- 2t ) = -\l 2 {z) + 1 L(z) +c + 0(\l - z|5~ e ). 

[The singular expansion for Li 11 (z) was obtained using the results at the bottom 
of p. 379 in \8\. We state it here for the reader's convenience: 

Li M (z) = l -L 2 {z) - <yL(z) + c+0(\l - z\), 

where c is again an unspecified constant.] Hence 

R 2 {z) = 8(1 - log2)L(z) +c+0(\l - z|^ e ), 

which leads to 

(4.5) M 2 (z) = 8(1 - log2)(l - zy x ' 2 L{z) + 5(1 - z)~ x ' 2 + 0(\1 - z\~ e ). 

We draw the attention of the reader to the cancellation of the ostensible lead-order 
term L 2 (z). This kind of cancellation will appear again in the next section when 
we deal with higher moments. 

Now using singularity analysis and estimates for the Catalan numbers we get 

(4.6) /2„(2) = 8(1 -log2)nlogn + cn + 0(n5 +e ). 
Using (£2}, 

VarX„ = p, n (2) - /i„(l) 2 = 8(1 - log2)nlogn + cn + 0(n^ +e ), 

which agrees with Theorem 3.1 of [5] (after a correction pointed out in |19J ) and 
improves the remainder estimate. In our subsequent analysis we will not need to 
evaluate the unspecified constant c. 
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4.3. Higher moments. We now turn our attention to deriving the asymptotics of 
higher moments of the shape functional. The main result is as follows. 



n ' 



Proposition 4.1. Define X n ;= X n - C a (n + 1), with X Q :— 0; (i n (k) := EI^ 
with /2 n (0) = 1 for all n > 0; and fi n (k) '■= /3 n p, n (k)/4: n . Let Mk{z) denote the 
ordinary generating function of (x n (k) in the argument n. For k > 2, M k (z) has 
the singular expansion 

k-l Lfe/2J k 
M k (z) = (1 - z)-— C kd LW*l-1(z) + OQ1- z\-* +1 - e ), 

3=0 



with 



1 44 (21 



C 2 lo = ll^ y 2j JG2jfiC2i-^,o, C 2 .o = 8(1 - log 2). 

Proof. The proof is by induction. For k — 2 the claim is true by ( ]4.5j ) . We note 
that the claim is not true for k = 1. Instead, recalling ( 14. 3j h 

(4.7) Mx(z) = -2L(z) ~ 2(2(1 - log 2) - 7 ) + 0(\1 - z\^ 2 ). 

For the induction step, let k > 3. We will first get the asymptotics of Rk(z) defined 
at (|3.7j) with B{z) = Li 01 (z). In order to do that we will obtain the asymptotics of 
each term in the defining sum. We remind the reader that we are only interested in 
the form of the asymptotic expansion of R k (z) and the coefficient of the lead-order 
term when k is even. This allows us to "define away" all other constants, their 
determination delayed to the time when the need arises. 

For this paragraph suppose that ki > 2 and k% > 2. Then by the induction 
hypothesis 

(4.8) 

i t 4.1. Lfci/2j + LW2j 

Z -M kl (z)M k2 (z) = \{l - £ A klM L^ + ^-\z) 

1=0 

ki+k 2 , 3 
+ OQ1-Z] — +2- e ), 

where Ak lt k 2 ,o = Cfc li oCfc 2 .o- (a) If k% = then fci + k 2 = k and the corresponding 
contribution to R k (z) is given by 

( 4. 9 , !(*) (,-.,-!+. 

Lfc I /2J + L(fe-fc 1 )/2J 

£ ^.fc-fe^^/aJ+LCfc-faVaJ-'^) + (\1 - z\~ 2+2-). 
;=o 

Observe that if fc is even and fci is odd the highest power of in ( j4.9[) is / 2 j — 1. 
In all other cases the the highest power of L(z) in (I4.9J) is [fc/2j ■ (b) If £3 7^ then 
we use Lemma 12.61 to express ( 14.8) as a linear combination of 

n Lfel/2J + Lfc2/2J 

Li ki+k, ,„.(*) f 

2 +Ai J ;=o 
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fci+fc 2 3 

with a remainder that is 0(\1 — z\ 2 2 c ). When we take the Hadamard 
product of such a term with Lif^^z) we will get a linear combination of 

s lk 1 /2\ + [k 2 /2\ 

— + 2J+k3 J 1=0 

ki + k 2 . 

and a smaller remainder. Such terms are all 0{\\ - zp^ - +1 " e ), so that the 

k 3 

contribution is (9(|1 - zp 2 + 2 ~ e ). 

Next, consider the case when fci = 1 and &2 > 2. Using the induction hypothesis 
and ( 14. 71 ) we get 

fc _ 1 LW2J+1 I fe I . 

(4.10) 4 2 

+ o(ii-zr- +i - 2e ), 

with £>fc 2i o = Cfe 2 .o- (a) If &3 = then k 2 = fc — 1 and the corresponding contribution 
to R k (z) is given by 

, , L(fc-i)/ 2 J+i I fc-i I , _, . , 



(4.11) --(l-z)-^ 1 ^ Bfe-ijIrL 2 J " (z) + 0(|l-z|- 2 + 2 - 2£ ). 

(b) If fc3 7^ then Lemma 12.61 can be used once again to express (4.10) in terms 
of generalized polylogarithms, whence an argument similar to that at the end of 
the preceding paragraph yields that the contributions to R{z) from such terms is 

k 2 1 k 3 

0(\1 - z\ ~~ e ), which is 0(\1 - zp 2 + 2_e ). The case when k x > 2 and k 2 = 1 
is handled symmetrically. 

When fci = k 2 = 1 then {z / 4)M kl {z)M k2 (z) is 0(|1 - z|~ £ ) and when one 
takes the Hadamard product of this term with Lio k3 (z) the contribution will be 

o(|i-^r 2£ ). 

Now consider the case when k\ = and k 2 > 2. Since M$(z) — CAT(z/4), we 
have 
(4.12) 

k _ 1 [k 2 /2] k 

' [ M kl (z)M k2 (z) = -(l-z)-^2- C k2tj Ll k ^(z) + 0(\l-z\-f+^). 

j=o 

By Lemma |2.6 this can be expressed as a linear combination of 

Lfe 2 / 2 J 



z - 
— j 
4 



k 2 . . 

with a 0(|1 — z|~ 2 e ) remainder. When we take the Hadamard product of such 
a term with Lio,fc 3 (z) we will get a linear combination, call it S(z), of 

Li ^-1. ... 0) \ 

2-+lj+*3 J i=Q 

k.2 k 3 

with a remainder of 0(|1 - zp~ +1_2e ), which is 0(|1 - z|~ 2 + 2 ~ 2e ) unless k 2 = 
fc— 1. When fc2 = fc — 1, by Lemma [2761 the constant multiplying the lead-order term 
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Li k fe-i (z) in S(z) is fc ~ 1,0 /i n 2 ' 2 When we take the Hadamard 
product of this term with Lio,k 3 (z) we get a lead-order term of 

) (- 
"Mo 



Ck-i.o (-|+2,lVJ) t - f \ 



Now we use Lemma 12.51 and the observation that A 



(a.r) (ct,s) 







Mo 



the contribution to Rk{z) from the term with k\ = and &2 = fe 

I til 4-1 



1 to conclude that 
1 is 



(4.13) 



E 

3=0 



(z)+0(|l-zr2 + 2-), 



with -Dfe^ — Cfc-i,o- Notice that the lead order from this contribution is precisely 
that from (14.111) but with opposite sign; thus the two contributions cancel each 
other to lead order. The case ki = and k\ > 2 is handled symmetrically. 

The last two cases are k% = Q, = 1 (or vice- versa) and k± = k 2 — 0. The 

contribution from these cases can be easily seen to be 0(\1 



k , 3 
z\-2+2- 



2 c \ 



We can now deduce the asymptotic behavior of Rk{z). The three contributions 
are (4.9). ( ]4.11j ), and (4. 13) , with only ( 14.9) (in net) contributing a term of the 

k 

form (1 - z)~ 2 +1 J LLfc/2j ^ when fc 

is even. The coefficient of this term when fc is 

even is given by 



E 

Q<k ± <k 
fci even 



Cfei,oCfc 2 ,0- 



Finally we can sum up the rest of the contribution, define C k j appropriately and 



use (13.61) to claim the result. 



□ 



4.4. A central limit theorem. Proposition |4.1 and singularity analysis allows 
us to get the asymptotics of the moments of the "approximately centered" shape 
functional. Using arguments identical to those in Section 13.31 it is clear that for 
k > 2 

£»(*) = r 7OY^ /2 [l0g"] Lfc/2J +0(n fe / 2 [lognp/ 2 J-i). 
This and the asymptotics of the mean derived in Section [4.11 give us, for k > 1, 



E 



-i 2k 



\J n log n 



C2k,oV^ 



E 



-i 2fe-l 



V n log n 



= o(l) 



as n — > co. The recurrence for C^o can be solved easily to yield, for k > 1, 

C2fc,0 



(2fc)!(2fc-2)! 2fc 



2 fe 2 2fc - 2 fc!(/e - 1)! 
where er 2 := 8(1 - log 2). Then using the identity 



r(fc-|: 

v 7 ^ 



2fc-2 

(2fc-2)! 



we get 



C2k,oV^ _ (2fc)! 2k 
T(fc-I) 2 fe A;! CT ' 
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It is clear now that both the "approximately centered" and the normalized shape 
functional are asymptotically normal. 

Theorem 4.2. Let X n denote the shape functional, induced by the toll sequence 
(logn) n >i, for Catalan trees. Then 

X n -C (n + 1) c ir, n 2\ „j X n -EX n c 



where 



Af(0,a 2 ) and " ■=> Af(0, 1), 

V n log n VVar X n 



r 2 



n 



and a 2 := 8(1 - log 2). 

Concerning numerical evaluation of the constant Co, see the end of Section 5.2 

in 

5. Sufficient conditions for asymptotic normality 

In this speculative final section we briefly examine the behavior of a general 
additive functional X n induced by a given "small" toll sequence (b n ). We have 
seen evidence [Remark |3.6( |d)] that if (6„) is the "large" toll sequence n a for any 
fixed a > 0, then the limiting behavior is non-normal. When b n = logn (or 
b n = n a and a J, 0), the (limiting) random variable is normal. Where is the interface 
between normal and non-normal asymptotics? We have carried out arguments 
similar to those leading to Propositions |3.8 and |3.9| (see also |5]) that suggest a 
sufficient condition for asymptotic normality, but our "proof" is somewhat heuristic, 
and further technical conditions on (b n ) may be required. Nevertheless, to inspire 
further work, we present our preliminary indications. 

We assume that b n = b(n), where b(-) is a function of a nonnegative real ar- 
gument. Suppose that x~ 3 ^ 2 b(x) is (ultimately) nonincreasing and that xb'(x) is 
slowly varying at infinity. Then 

EI„ = C {n + 1) - (1 + o(l))2V^ 3/2 6», 

where 

n=l 

Furthermore, 

VarX n - 8(1 - log 2)[nb' (n)] 2 n logn, 

and 

X "rf° / ( " + 1) ^WO, wherea 2 = 8(l-log2). 
no (n)y'n logn 

This asymptotic normality can also be stated in the form 

V Var X n 
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